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Abstract — In practical asynchronous bi-directional relaying, 
symbols transmitted by two sources cannot arrive at the relay 
with perfect frame and symbol alignments and the asynchronous 
multiple-access channel (MAC) should be seriously considered. 
Recently, Lu et al. proposed a Tanner-graph representation of 
the symbol-asynchronous MAC with rectangular-pulse shaping 
and further developed the message-passing algorithm for optimal 
decoding of the symbol-asynchronous physical-layer network 
coding. In this paper, we present a general channel model for 
the asynchronous MAC with arbitrary pulse-shaping. Then, the 
Bahl, Cocke, Jelinek, and Raviv (BCJR) algorithm is developed 
for optimal decoding of the asynchronous MAC channel. For 
Low-Density Parity-Check (LDPC)-coded BPSK signalling over 
the symbol-asynchronous MAC, we present a formal log-domain 
generalized sum-product-algorithm (Log-G-SPA) for efficient 
decoding. Furthermore, we propose to use cyclic codes for 
combating the frame-asynchronism and the resolution of the 
relative delay inherent in this approach can be achieved by 
employing the simple cyclic-redundancy-check (CRC) coding 
technique. Simulation results demonstrate the effectiveness of the 
proposed approach. 

Index Terms — asynchronous bi-directional relaying, network 
coding, BCJR algorithm, cyclic codes, LDPC codes. 



I. Introduction 

NETWORK coding has shown its power for disseminating 
information over networks ffl. ||2l. For wireless cooper- 
ative networks, there are increased interests in employing the 
idea of network coding for improving the throughput of the 
network. Indeed, the gain is very impressive for the special 
bi-directional relaying scenarios with two-way or multi-way 
traffic as addressed in Q. 

For bi-directional relaying, two sources A and B want to 
exchange information with each other by the help of a relay 
node R as shown in Fig. Q] Traditionally, this can be achieved 
via four steps. Recently, it was recognized that only two 
steps are essentially required with the employment of the 
powerful idea of physical-layer network coding (PNC) |4|. In 
particular, the superimposed signal received at the relay can 
be viewed as the physically-combined network coding form 
of the two source messages further impaired by the channel 
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noise. Hence, the so-called physical-layer network coding can 
be well employed to improve the throughput of bi-directional 
relaying. 




Step 1 : Multiple Acess Phase Step 2: Broadcast Phase 

Fig. 1. Bi-directional relaying with PNC. 

For bi-directional relaying with PNC, it is assumed that 
communication takes place in two phases - a multiple access 
phase and a broadcast phase as shown in Fig. Q] In the 
first phase, the two source nodes send signals simultaneously 
to the relay. In the second phase, the relay processes the 
superimposed signal of the simultaneous packets and maps 
them to a network-coded (XOR) packet for broadcast back to 
the source nodes. Then, both sources can retrieve their own 
information as they know completely what they have sent. 
Compared with the traditional relay system, PNC doubles the 
throughput of the two-way relay channel. 

To be more practical, channel coding should be employed 
to further improve the reliability of the system. In 0, J6), 
joint channel decoding and physical layer network coding 
(JCNC) have been introduced. It was recognized that with 
the same linear channel code at both source nodes, the 
XOR of both source codewords is still a valid codeword. 
Thus, the received signal can be decoded to the XOR of the 
source information at the relay without changing the decoding 
algorithm. In (7), we derived the closed-form expression for 
computing the log-likelihood ratios (LLRs) of the network- 
coded codeword for a complex multiple-access channel, and 
it was revealed that the equivalent channel observed at the 
relay is an asymmetrical channel. Although this approach 
can be efficiently implemented, it does result in performance 
loss due to the use of Pr(c a (fc) © Q,(fc)|rfc) [j while the 
joint probabilities Pr(c (fc), c&(fc)|rfc) is not fully used. In 
0, a novel decoding scheme based on the arithmetic-sum 
of the source codewords was proposed for repeat-accumulate 
(RA) codes. A generalized sum-product algorithm (G-SPA) 
over the Galois field GF(2 2 ) was proposed in (SI for LDPC 

'For the proper definition, we refer readers to Section-II. 
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coded BPSK system, which can work directly with the joint 
probabilities Pr(c a (fc), Cb(k)\rk) and a significant gain was 
observed compared to the JCNC approach. Its extension to 
QPSK signalling was developed in [9|. 

A key issue in practical PNC is how to deal with the 
asynchrony between the signals transmitted by the two source 
nodes. That is, symbols transmitted by the two source nodes 
could arrive at the receiver with both symbol and frame 
misalignments. 

In iflOl . Lu et al. proposed a Tanner-graph representation of 
the symbol-asynchronous multiple-access channel (MAC) with 
rectangular-pulse shaping and also developed the message- 
passing algorithm for optimal decoding of asynchronous 
physical-layer network coding. Furthermore, the message- 
passing algorithm was also developed when RA codes are 
employed. This message-passing algorithm is, in essence, the 
cascade of the BCJR algorithm for the asynchronous physical- 
layer network coding and the G-SPA [|8] over the underlying 
Tanner-graph of the specified RA code. However, its current 
form is in the probability-domain with channel coding re- 
stricted to the special RA codes, which is not desirable in 
practice. 

In this paper, we provide further insights into the asyn- 
chronous bi-directional relaying. In particular, the general 
asynchronous MAC channel with arbitrary pulse-shaping is 
developed and its connection to the rectangular-pulse shaping 
IfTUl is discussed. Then, the BCJR formulation of the asyn- 
chronous MAC channel is proposed, which can shed lights for 
various practical algorithms suitable for implementation. For 
LDPC-coded BPSK signalling over the symbol-asynchronous 
MAC, we present a formal log-domain generalized sum- 
product-algorithm (Log-G-SPA) for efficient decoding. Fur- 
thermore, we propose to use cyclic codes for combating the 
frame-asynchronism and its related problem of delay resolu- 
tion is discussed in detail. 

The rest of the paper is organized as follows. In Section- 
II, a general channel model for asynchronous physical-layer 
network coding is developed. We then formulate the Log-G- 
SPA decoding for joint LDPC and PNC over asynchronous 
MACs in Section-Ill. Section-IV address the problem of frame 
asynchronism. Simulation results are provided in Section- V, 
and the conclusion is made in Section- VI. 

II. General Channel Model for Asynchronous 
Physical-layer Network Coding 

A. Asynchronous Multiple Access Channel Model 

During the multiple-access phase, the source nodes A and 
B transmit the modulated signals x a (t) and x b {t) to the relay 
simultaneously. For a general continues-time multiple-access 
channel, the received signal at the relay can be expressed as 

y(t) = h a x a {t) + h b x b (t) + w(t) 

oo 

= ^ h a Cg(k)g a (t -kT -T a ) 

oo 

+ J2 h bCb{k)g b {t-kT- n ) + w{t), (1) 

fc=0 



where the delays r a <E [0,T),T b e [0,T) account for the 
symbol asynchronism between source nodes A and B, w(t) is 
the complex white Gaussian noise with power spectral density 
equal to the channel coefficients h ai h b are complex chan- 
nel gains keeping fixed during transmission, {c a (k)}, {c b (k)} 
are the modulated sequences, and g a (t),9b(t) are normalized 
pulse-shaping functions (j_ \g a (t)\ 2 dt = 1) for source nodes 
A and B, respectively. In this section, we focus on the symbol 
asynchronism. Without loss of generality, we assume that 
< r a < r b < T and both of them are known to the receiver. 
The frame asynchronism is considered in section-IV. 

By passing the observations through two matched filters for 
signals x a (t) and x b (t), respectively, one can get the following 
discrete-time samples 

/oo 
- kT - Ta)dt, 
-OO 

/OO 
y(t) g ;(t - kT - T b )dt. (2) 
-oo 

It can be well understood that the discrete samples 
{ [y a (k), j/b(fc)] T } are sufficient statistics for the maximum a 
posteriori (MAP) symbol detection as explained in ifTTIl . By 
incorporating (fl~|i into (|2}, it follows that 



Va{k) 



(k) + ^ h bPab(l)c b (k -l) + w a (k), 



y b (k) = hbc b (k) + h a pba(l)c a (k -l) + w b (k), (3) 
i 

where 

g* a (t)g b (t + IT + r a - n )dt, 
9t(t)g a (t + lT + n-T a )dt, (4) 



Pab(l) 
Pba(l) 



— oo 
oo 



and 



w a (k) 
w b (k) 



w(t)g* a (t - kT - Ta )dt, 

i 

w{t)gl{t - kT - r b )dt. (5) 



One can also rewrite (01 in the matrix form, as shown at 
the top of the next page. 

Here, the discrete random process {[w a (k), w b (k)] T } is 
Gaussian with zero mean and covariance matrix: 



5* 



w a (k) 
w b (k) 



a 2 A(k-j) 



(7) 



where A(fc) = if |fc| > L and A(0), A(Z), A(-Z) are given 
as follows 

1 Pab{0) 
Pba(0) 1 



A(0) = 



(8) 



A(0 = At(-/) 



,1 = 1, 



,L. (9) 



p ba (l) 
Pab{l) 

Here, L denotes the memory length of the channel, which is 
determined by the correlation of the pulse-shaping functions 
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h a pba(0) 



hbPab(0) 

hb 



Ca{k) 

Cb(k) 



E 



h b Pab{l) 




Ca(k ~ 


I) ' 




w a (k) 


haPba{l) 




Cb(k - 
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+ 


Wb(k) 



(6) 



For convenience of the MAP detection, the whitened 
matched filter (WMF) is often employed for transforming the 
received signal into a discrete time sequence with minimum- 
phase channel response and white noise. This procedure often 
simplifies analysis and is a first step in the implementation of 
some estimators, including the maximum-likelihood sequence 
estimation detector (MLSE) and the MAP detector. The WMF 
is determined by factoring the channel spectrum into a product 
of a minimum phase filter and its time inverse. 

Let Sl(z) be the (two-sided) z transform of the sampled 
autocorrelation sequences A(fc), i.e., 



(10) 



E A ( k >' k - 

k=-L 

By noting the property (O, it follows that fl(z) can be factored 



n(js) = F t (z- 1 )F(z). 



(11) 



By invoking the spectral factorization theorem, it is reason- 
able to find a physically realizable, stable discrete-time filter 
(F^(z -1 )) , which can transform a colored random process 
into a white random process. 

In lfl2l . it has been shown that F(z) has the form of 



where 



F(z) 



Fi = 



L 

E 



f l 

J aa 
fba 



fab 
fbb 



(12) 



(13) 



Consequently, passage of the received vector sequence 
{y{k) = [y a (k),y b (k)] T } through the digital filter 
(F^(z -1 )) results into an output vector sequence 
{r(fc)} that can be expressed at the top of the next page. 
Now, the discrete random process {n(fc) = [n a (k), n b (k)] T } 
is zero-mean white Gaussian process with covariance of er 2 I. 

" r„(fc) " 
. r b (k) 
legantly expressed as 



Ca{k) 

c b {k) 

formulation dT4b can be e 



Let Cab(k) = 



and r(fe) = 



Then, the 



r(k) = * (c o6 (fc), • • • , c ab (k - L)) + n(k) 



4 ^({Cab(k-l)} 



1=0) 



n(k). 



(15) 



It is clear that the function $(•,•) is linear. By assuming 
the ideal knowledge on \&(-,-) and a 2 , the asynchronous 
MAC can be modeled as the vector inter-symbol interference 
(ISI) channel. To estimate the a posteriori probability (APP) 
Pr (c a i,(jfc)|r^ _1 ), the BCJR algorithm can be naturally em- 
ployed. Here, N denotes the observation length at the relay. 



B. Rectangular-pulse shaping 

Let 5 = Tb ^ Ta denote the relative delay between source 
nodes A and B. For the rectangular pulse-shaping functions 
9a(t),9b(t), i e., g a (t) = g b (i) = u(t) - u{t - T) with u(t) 
denoting the unit step function, the authors in iflOl proposed 
to consider the following discrete-time samples 

y e (k) = - / y(t)g(t-kT-T a )dt 

6 JkT+r a 

1 ,(fc+l)T+T 

(1 - 0) JkT+r b 

It is clear that the matched-filter outputs (0 can be well related 
to ( fTo*b as follows: 



y a (k)=Sy e (k) + (1 - S)y (k), 
y b (k)=(l-S)y (k)+Sy e (k + l). 



(17) 



Hence, the samples {[y e (k), y (k)] T } are also the sufficient 
statistics for the MAP detection. By combining (Q~|) and JT6b . 
it follows that 



y e (k)=h a c a (k) + h b c b (k - 1) + w e (k) 
y (k)=h a c a (k) + h b c b (k) + w (k), 



(18) 



where w e (k) and w a (k) are independent zero-mean complex 



l 

one can write ( TT8l as the following matrix form 



a . Hence, 



Ve(k) 

Vo{k) 



h b ' 


h a 
h a h b 



Ca(k-l) 
Cft(fc-l) 

c a (k) 



w e (k) 
w (k) 



(19) 



Hence, the equivalent ISI channel model dT4b is still valid. 

C. BCJR Algorithm 

In this subsection, we formulate the BCJR algorithm lfl3ll . 
which is known to be optimal in implementing the MAP 
symbol detection for linear channels with finite memory. 

Let us define, at time epoch k, the state Sk as 



Sk = {c a b{k - 1), • • • , c ab (k - L)) 
and the branch metric function as 

7fc(sfc,c afc (fc)) oc Pr(c ah (fc)) 

( lr(fc)~*({ Cah (fc-Ohio)l 



(20) 



■ exp 
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(21) 



The BCJR algorithm is characterized by the following forward 
and backward recursions: 



a k+ i(s k+1 ) = ^ y^7~(c afe (fc),s fc ,s fc+1 ) 

c a6 (fc) s k 

•afe(sfe)7fc(s fc ,c a& (fc)), 



(22) 
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r«(k) 
n{k) 



L r 



E 

z=o 



"«J aa 


h b fL ' 




C a (k - I) 




n a (k) 




hbfL _ 




c b (k - I) 


+ 


n b (k) 



(14) 



where T(c ab (k), Sfc, Sfe+i) is the trellis indicator function, 
which is equal to 1 if c &(fc),Sfc, and Sfc+i satisfy the trellis 
constraint and otherwise; 



have 



fik{sk)= ^2 ^ T(c ab (k),s k ,s k+ i) 

(Sfc+1 )7fc (fifc , c Qb (fc) ) . 



(23) 



Then, the joint APPs Pr (c a b(fc)|r^ x ) can be calculated as 

Pr^fc)!^- 1 ) 

= T(c ab (k), s k +i)a k +i(sk+i)f3k+i(sk+i), (24) 

where the indicator function T(c ab (k), Sk+i) is equal to 1 if 
s/c+i is compatible with c ab (k) and otherwise. 

For rectangular-pulse shaping, it should be pointed out that 
L = 1 and the value of \fr (c ab (k),c ab (k — 1)) is independent 
of c a (k — 1), hence the state s k can be further simplified as 
Sk = (c b (k - 1)). 

Just like in lfl4l . the proposed BCJR algorithm can be 
implemented efficiently in the log-domain, i.e., the Log-BCJR 
algorithm (or the Log-MAP algorithm). In what follows, we 
denote by #MAc(r^~\ Li(c ab (k)), L e (c ab (k))) as the Log- 
BCJR algorithm, where Li(c ab (k)) denotes the a priori infor- 
mation and L e (c ab (k))) the a posteri extrinsic information, 
both in the log-domain. The further simplification of the Log- 
MAP algorithm, such as the Max-Log-MAP algorithm, is also 
straightforward, with some potential performance loss. 

With the joint APPs Pr (c a f,(fc)|r^ r_1 ), one can calculate 
the APPs of the XOR codeword Pr (c a (fc) © c 6 (fc)|r^~ 1 ) for 
physical network coding. If both sources A and B assume 
the same linear channel code, the relay node can make use 
of Pr (c a (/c) © Cb(fc)|r^ _1 ) to perform channel decoding to 
obtain the pairwise XOR of the source symbols. However, this 
disjoint channel-decoding and network-coding scheme, i.e., the 
JCNC scheme, performs worse than the joint channel-decoding 
and network-coding scheme, i.e., the G-SPA scheme [8|, |10|. 

III. Log-G-SPA Decoding of Joint LDPC and PNC 
over the Asynchronous MAC 

In this paper, the employment of the LDPC coding scheme 
is assumed for both sources A and B. Let C a be a (N, K a ) 
LDPC code of block length N and dimension K a for source 
A, which has a parity-check matrix H a — [ft. m , n ] of M 
rows, and N columns. Let R a — K a /N denote its code rate. 
Correspondingly, we can define the code C b with a parity- 
check matrix of H b for source B. 

For any given LDPC encoded vector c a = 
(c a (0),c a (l), • • • ,c a (N - 1)) T for source A and 
Cb = (c(,(0), Cfe(l), • ■ • ,c b (N — 1)) T for source B, we 



H a c a =0, 
H b c b =0. 



(25) 



For joint LDPC and physical-layer network coding, we con- 
sider the employment of the same LDPC code at both sources 
A and B. In this case, one have that H a = H b = H = [h m , n ] 
and 



H(c a © c 6 ) = 0. 



(26) 



For the relay R, it tries to decode c r = c a © C&. During the 
broadcast phase, the relay transmits the XOR codeword c r to 
both sources A and B. Then, both sources A and B decode 
c r = c a © Cb based on the received signal vector and since 
they have c a and c b , they can obtain c b and c a , respectively. 
Hence, the bottleneck is to decode c r for the relay node during 
the multiple-access phase. 

Instead of decoding the source signals separately or by de- 
coding the XOR, the authors in 1 8 1 propose to decode the two 
codes jointly with a generalized sum-product algorithm (G- 
SPA). With this G-SPA decoding, the received superimposed 
signal is first decoded to {c ab (k)} with respect to Galois- 
field GF(2 2 ) for the BPSK signalling and then the XOR 
rule is executed before transmission to both sources. This 
approach almost exploits all available information about the 
superimposed receive signal as well as the code structure, 
hence it can achieve excellent performance. 

In what follows, we present a log form of the G-SPA (Log- 
G-SPA) decoding for joint LDPC and physical-layer network 
coding over the asynchronous multiple-access channel. For 
convenience, we focus on the BPSK signalling. However, its 
generalization to the QPSK signalling is straightforward (9). 

For H = [h m ,n] and an eligible codeword 
(cq, cx, ■ ■ ■ , cat_i), one have that 



J2 h 



o. 



For the G-SPA decoding, one can consider a virtual com- 
bined encoder which maps the messages generated by both 
sources A and B into the virtual codeword c ab (D) = 
[cf(D),--- ,<#_!(£>)], where cf{D) = c a (n) + c b (n)D. 
Let /i^ n (D) = 1 if h m;n — 1, zero otherwise. Hence, each 
virtual codeword c ab (D) can be seen as a codeword with 
elements taken from GF(2 2 ) and its corresponding parity- 
check matrix H G takes values from GF(2 2 ) with a special 
constraint of H G = H. Finally, a virtual 4-ary LDPC coding 
scheme is obtained with the codeword c ab {D) satisfying 

Y, h ?n,n(D)c a n(D) =0 mod (1 + D + D 2 ). (27) 

n 

This insight can be well employed to develop a generalized 
SPA over GF(2 2 ), which is a simpler version of the standard 
SPA employed in GF(2 2 )-LDPC coding scheme. Indeed, the 
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permutation step inherent in the standard SPA for decoding of 
the non-binary LDPC code can be totally neglected thanks to 
the special form of the parity matrix. 

We denote the set of variables that participate in check m 
by Af(m) = {n : n (D) = 1}. Similarly, we denote the set 
of checks in which variable n participates as A4(n) = {m : 
h% n (D) = 1}. We denote by J\f(m)\n as the set M{m) with 
variable n excluded and by Ai(n)\m as the set M.(n) with 
check m excluded. 

We also denote by V m as the subset of variables corre- 
sponding to the non-zero elements in mth row of H , by 
GF(4) — {af),ai,a 2 ,a 3 } as the finite field of size 4, by 
L(v = cti) = In (Pr(i> = a^r^ )) as the log value of the APP 
Pr(i> = cti\rff), by L(v) = [L(v = a ),L(v = ai),L{v = 
OL2),L(y = 0:3)] as its vector form. For the Log-G-SPA, the 
message updated from the variable-to-check message from n 
to rn is denoted by L nm (v n ), while the check-to-variable 
message from m to n is denoted by L m>n (v n ). The notations 
of v n and c a b(n) can be interchangeably used . 

For LDPC coded BPSK signalling over asynchronous 
MACs, the Log-G-SPA can be formally stated as follows. 

A. Log-G-SPA 
SI: Initialization: 

7i : A priori information for LDPC decoding 

I2 : A priori information for Bmac algorithm 
Li(c ab (n)) = 0. 
S2: Implement the Log-BCJR algorithm for the 
asynchronous MAC: 

Bmac (rtf, Li(c ab (n)), L e (c ab (n))) , 
and outputs: 

£n,m(«n) = L e (c ab (n)) = [ln(Pr(w„ = ai\r£))], 

which is initialized as the variable-to-check messages. 
For first iteration, 

L„(v n ) = L e {c ab {n)). 

S3: Hard decision: 

v n = argmin L n (v n ) := c a (n) + c b (n)D; 
c r = [c r (0), • • • ,c r (N - l)],c r (n) = c a (n) ® c 6 (n); 

If (Htr == 0) 

output c r and terminate the decoding; 
S4: Check node processing: 

V m \v n n' £j\f(m)\n 

s -t- h m,n' v n> = 0; 

n'SAf(m) 



S5: A posteriori information computation: 

L n {vn) = L n (v n ) + S ' L m _ n (v n ); 

S6: Variable node processing : 

-^n,m(^n) — 7 / n(^ J n) -^m,n(^n)i 

and extrinsic information extraction : 

Li(v n )=L n (v n ) - L e (v n ); 

Go to step S2. 

The check node processing function can be computed 
recursively as 

L(v n ) - L(vi) © L(v n ) , (28) 
v n ev m \y n &V m \vi I 

where 

L(vi) © L(v 2 ) := L(vi + v 2 ) = [L{v x + v 2 ) = a,] , (29) 
and 

L(v 1 +v 2 = a i ) = ]n( ^ e H^+Ha i -x) j 

\xGGF(4) J 

-In [ J2 e L ^+ L( --A . (30) 

B. Comments 

The presented log-form version is, in essence, tutorial. 
In JS), the G-SPA is proposed explicitly for decoding of 
LDPC coded modulation over synchronous MACs. Then, the 
authors in [10] developed the G-SPA decoding of RA coded 
modulation over the symbol-asynchronous MAC. 

Clearly, the presented Log-G-SPA for LDPC coded asyn- 
chronous relaying is the cascade of two sub-message-passing 
algorithms, which include the Log-BCJR algorithm {Bmac) 
for the asynchronous PNC and the Log-G-SPA for the two- 
user LDPC codes. In its current form, two sub-message- 
passing algorithms run one iteration in turn and then exchange 
information iteratively. In practice, the rates of convergence for 
these two sub-message-passing algorithms are different and 
decoding of the LDPC code is often slower. Let us claim the 
outer iterations for Bmac, an d the inner iterations for the 
Log-G-SPA decoding of the two-user LDPC codes. Hence, 
one can place more than one inner iterations for the Log-G- 
SPA decoding of the two-user LDPC codes for each outer 
iteration. This will be discussed in simulations. 
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Fig. 2. Frame asynchronism between sources A and B. 

IV. Frame-asynchronous Bi-directional Relaying 

A. Channel Model 

In section-II, the symbol-asynchronous multiple-access 
channel model is considered, where the relative relay 8 = 
Tb ^ r r " is restricted to < 5 < 1. In practice, the asynchronism 
between sources A and B cannot be controlled elegantly in 
this manner. The frame asynchronism should also be seriously 
considered. Hence, one can consider that S = l + e as shown 
in Fig. |2] where i is a nonnegative integer number with 
< l <C N and e is a fractional value with < e < 1. For 
L -C N, it means that some control mechanisms for synchro- 
transmission between sources A and B are still required in 
the multiple-access phase, which, however, can be reasonably 
relaxed. 

Then, 



c b (k - l) 



, and r(k) 



Let c ab (k, l) 
the formulation (TBI still holds with the form of 

r(fc)=* ({c ab (k-l,L)}f =0 )+n(k). 



a(k) 

n{k) 



(31) 



B. Cyclic Codes for Combating the Frame-asynchronism 

For the frame-asynchronous bi-directional relaying, it is 
natural to ask the LDPC coding to have the following property 



H(c a 



0, 



(32) 



b = (c b (N - i + 2), c b (N - i + 3), • • • , c b (N - 



where c 

1), Cb(0), o,(l), • • • , c b (N — l+ 1)) is the t-cyclic shift of c;,. 
It means that 

(33) 



Hc[ l) = 0. 



Hence, the LDPC code should be the cyclic code. 

In this manner, one can still construct a virtual com- 
bined encoder which maps the messages generated by both 
sources A and B into the virtual codeword & ab (D) = 
[<#(£>), ■ • • ,c#_ 1)4 p)], where = c (n) + c b (n - 

l)D with n — L = n — i mod N. Then, a virtual 4-ary LDPC 
coding scheme is again obtained with the codeword c c ab (D) 
satisfying 



E h ™,n( D ) c t( D ) = mod (1+D + D 2 ) 



(34) 



with a special constraint of H = H. 

Let t max be the integer part of the potential maximum 
delay, i.e., — t max < l < t max . For the received signal 
vector r/y _1 , the i elements at both the head and the tail of 



JV-l 



can be thought as the interference part. Let 1 = 



[0,, max ,r, max ,0 tmax ], which can be seen as the worst 
case for extracting the a posterior information for c ab (k,i). 
With the log-BCJR algorithm for the asynchronous MAC, 



i.e., Bmac (tq 1 ,L i (c ab (k,L)),L e (c ab (k,L))), one can ob- 
tain the estimate of ln(Pr(c a b(fc, t)|r^ -1 )). If we assume that 
the transmission of LDPC coded packets is continuous for both 
sources A and B, it is clear that both the head and the tail of 
each superimposed packet may be corrupted by the past and 
future packets. Indeed, the number of symbols corrupted is 2l 
for each LDPC frame, which results into the possible SNR 
loss of 101ogl0(^2i) in dB. If b < N, the SNR loss is 
minor. 

With the cyclic LDPC codes applied at both sources A and 
B, the Log-G-SPA algorithm presented in Section-Ill can be 



again employed for getting the estimate of c a f 
the relative delay l should be resolved. 



c b \ However, 



C. Resolving the Relative Delay i 

For the sources to correctly decode the messages, it is 
essential to resolve the unknown delay t. Here, we propose two 
potential mechanisms to solve this problem. Let T^^(r^ _1 ) 
denote the output LLR vector after i-th message-passing 
decoding with the channel input vector r^^ 1 . Consider the 
case of continuous transmission from both sources A and 
B to the relay R. For the received signal vector r^" 1 , the 

can be 



l elements at both the head and the tail of r 



N-l 

o 



thought as the interference part. In general, the interference 
part is detrimental for successful decoding of XOR codeword 
(c a ©c^). Hence, it is natural to replace the interference part 
with the all-zero vector. Based on this observation, we propose 
the following estimator 



max 

l.£U 



(35) 



In simulations, we found that this method does not work well 
as the contribution of the interference part is minor when the 
value of l is small. 

To correctly resolve the delay i with high probability, we 
propose to employ cyclic redundancy check (CRC) codes to 
identify the message. CRCs are specifically designed to protect 
against common types of errors on communication channels, 
where they can provide quick and reasonable assurance of 
the integrity of messages delivered. In the consider scenarios, 
CRCs are appended to a message packet, which is further 
LDPC encoded for possible transmission at each source node. 

With the Log-G-SPA decoding developed in Section-Ill, 
the relay decodes the received signal and can output two 
codewords c a and c b . Then the codeword c b is cyclically- 
shifted to I = — t m ax, • ■ • j (max positions and CRC-checking 
is performed to resolve the true value of i. Simulations show 
that this mechanism can work well. 

If the JCNC scheme is employed, the thing is different. 
However, it is still possible to resolve the delay but this task 
has to be completed in the broadcasting phase. In this case, 
the relay decodes the received signal and output the codeword 



with unknown i at the MAC phase. Then 

= (c a © c^) has been broadcasted to 



the codeword c r 
both sources. For the source A, the message-passing decoder 
is implemented to get an estimate of c r , which is further 
processed with its own codeword c a to get the estimate of 
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. Then, is cyclically-shifted and CRC-checking is 
performed to identify the proper value of the shift t. For the 
source B, the message-passing decoder is also implemented 
to get an estimate of c r . Then, Cb is cyclically-shifted to get 
the estimate of iv . Each copy of the cyclic-shift of is 
employed to XOR the decoded c r for obtaining the estimate 
of c a . Finally, CRC-checking is again performed to identify 
the correctly-extracted codeword c a . 

V. Simulation Results 

In simulations, a square-root-raised-cosine filter is employed 
for pulse-shaping at the transmitter. The roll-off factor (3 is 
chosen to be /3 = 1 for both sources A and B. As the 
bottleneck of the bi-directional relaying system lies in the 
processing capability of the relay node and its performance. 
For joint LDPC and PNC scheme, it is the duty of the relay 
to reproduce the XOR of both source codewords. Hence, we 
mainly focus on the performance of the XOR codeword c r . 
The performance is closely related to the energy per bit and 
the received noise variance. The BPSK modulation is adopted. 

Two LDPC codes are considered. The first one is a (3,6)- 
regular Mackay-Neal LDPC code with codewords of length 
N = 1008 03], and the second one is a cyclic LDPC code 
derived from finite-geometry codes with codeword length of 
N = 1365 and information length K = 765 ifTBI . Let us 
denote the maximum number of outer iterations (Bmac) by 
n a = 4, and the maximum number of inner (LDPC) decoding 
iterations per outer iteration by rij. 

In simulations, we consider the normalized equal-power 
complex MAC channel, namely, \h a \ = \h b \ = 1. This 
complex MAC channel is often characterized by the carrier- 
offset A9 = 6 b - 9 a , where h a = \h a \e j9 » and h b = \h b \e je " 
are complex variables. Throughout the simulations, the symbol 
misalignment with e = 0.5 is considered. 




10~ 5 I 1 1 1 

2 2.5 3 3.5 

Eb/NO, de- 
Fig. 3. The effect of the number of the outer iterations for the Log-G-SPA 
decoding 

Firstly, the effect of n , the number of Bmac iterations, on 
the system performance is investigated. The number of total 
inner LDPC decoding iterations is fixed to rii -n = 20. For the 
(3,6)-regular Mackay-Neal LDPC code, the BER performance 



is shown in Fig. [3] for the normalized equal-power complex 
MAC channel with AO = 7r/4. As shown, the exchange of 
information between Bmac an d the LDPC decoder can be 
beneficial for performance enhancement, and n = 4 is enough 
for the considered case. 
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Fig. 4. The bit-error-rate performance of the (1008,504) Mackey-Neal LDPC 
over Symbol-Asynchronous and Synchronous MACs with different AOs. 

Secondly, the BER performance of Log-G-SPA decoding 
with symbol-asynchronism is presented with various values 
of AO, which is further compared to that of the JCNC 
decoding. It is shown in Fig. |4] that the Log-G-SPA performs 
significantly better than the JCNC, and its robustness to AO 
is also observed with symbol-asynchronism compared to the 
JCNC iflOl . For the JCNC decoding, the performance under 
the symbol-asynchronism is deteriorated compared to the case 
of the synchronous MAC (Syn-MAC). 
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Fig. 5. The bit-error-rate performance of the (1365,765) cyclic LDPC over 
Frame-Asynchronous and Synchronous MACs with A8 = ^ 

Finally, we investigate the effect of both frame and symbol 
misalignments on the system performance. Hence, the cyclic 
LDPC code is employed and the maximum number of itera- 
tions are set to n a — 4,ni — 10. For frame misalignment, l 
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is assumed to be randomly picked from [—8,8]. The BER 
performance is shown in Fig. [5] for the normalized equal- 
power complex MAC channel with A9 — 7r/4. As shown, 
the performance degradation due to the frame misalignment 
is less than 0.2 dB in this case. If the mechanism of (f35T > is 
incorporated, the performance gap can be further narrowed 
but the complexity must be increased to about 2l times 
of the original algorithm. To resolve the relative delay i, 
the CRC-16 is appended to the source message of length 
K m = K—16 = 749 for both sources A and B. In simulations, 
the resolution process is only initiated when the parity-checks 
of both LDPC codes are satisfied, i.e., the error-free case. We 
found that the resolution process always works and the correct 
value of l can be found once the parity-checks of LDPC codes 
are satined. 

VI. Conclusion and Future Work 

We have presented a general channel model for the asyn- 
chronous multiple-access channel with arbitrary pulse-shaping, 
typically encountered in bi-directional relaying. By evoking 
the WMF technique, one can arrive at an equivalent vector ISI 
channel, which can be employed to develop the well-known 
BCJR algorithm for getting the optimal APPs. 

We also present a formal Log-G-SPA decoding for the 
LDPC coded BPSK signalling over asynchronous MACs. To 
further combat the frame asynchronism, the cyclic LDPC 
codes, along with the CRC techniques, are proposed. Various 
proposed techniques can be well employed to overcome the 
asynchronism of the bi-directional relaying. 

There are several issues to be explored in future. For 
either continuous or burst transmission of LDPC frames, the 
proposed methods can only work with limited misalignment in 
frames, i.e., a small value of i relative to the codeword length. 
This method generally requires to some control mechanism 
for cooperative transmission between sources A and B. If no 
control mechanism is required, one must seriously consider the 
case of large values of l. For this case, one have to base several 
LDPC frames for decoding and the interference cancelation 
technique can be helpful for successful decoding and correct 
resolution of unknown delay. 

The Log-G-SPA decoding has shown its performance ad- 
vantage if both sources A and B employ the same LDPC 
code. However, it remains unknown how to design LDPC 
codes for maximizing the system performance with either 
synchronous or asynchronous multiple-access channels. Fur- 
thermore, compared to the Log-SPA decoding of non-binary 
LDPC codes, the complexity of Log-G-SPA decoding for 
LDPC coded transmission over MACs is somewhat lower 
since the permutation step is not required. However, there are 
more work to do for its practical implementation. 

Acknowledgment 

The authors wish to thank Dr. Qin Huang for providing the 
parity-check matrix of the (1365, 765) cyclic LDPC code. We 
also thank Dr. Ming Jiang for helpful discussions. 



References 

[1] R. Koetter and M. Medard, "An algebraic approach to network coding," 
IEEE/ACM Trans. Networking, vol. 5, pp. 782-795, Oct. 2003. 

[2] P. Chou and Y. Wu, "Network coding for Internet and wirless networks," 
IEEE Signal Processing Magazine, pp. 77-85, Sep. 2007. 

[3] P. Popovski and H. Yomo, "Physical network coding in two-way wireless 
relay channels," in Proc. of IEEE -ICC, Glasgow, Scotland, Oct. 2007. 

[4] S. Zhang, S. Liew, and P. Lam, "Hot topic: Physical layer network 
coding," in Proc. International Conference on Mobile Computing and 
Networking (MobiCom), Los Angeles, CA, USA, 2006, pp. 358-365. 

[5] S. Zhang, S. chang Liew, and P. P. Lam, "Physical-layer network 
coding," in in ACM Mobicom '06, 2006. 

[6] S. Zhang and S. C. Liew, "Channel coding and decoding in a relay 
system operated with physical-layer network coding," IEEE Jour. Select. 
Areas in Comm., vol. 27, pp. 788-796, Jun. 20009. 

[7] X. Wu, W. Zeng, C. Zhao, a nd X. You, "Joint network and LDPC coding 
for bi-directional relaying," http://arxiv.org/abs/1105.2422 May 2011. 

[8] D. Wubben and Y. Lang, "Generalized sum-product algorithm for joint 
channel decoding and physical-layer network coding in two-way relay 
systems," in IEEE Proc. Global Communications Conference ( GLOBE- 
COM), Miami, FL, USA, Nov. 2010. 

[9] D. Wubben, "Joint channel decoding and physical-layer network coding 
in two-way QPSK rrelay systems by a generalized sum-product algo- 
rithm," in Proc. 7th International Symposium on Wireless Communica- 
tion Systems (ISWCS), York, UK, 2010. 
[10] L. Lu and S. C. Liew, "Asynchronous physical network coding," 

|http://arxiv.org/abs/l 105.31441 May 2011. 
[11] S. Verdu, "The capacity region of the symbol-asynchronous gaussian 
multiple-access channel," IEEE Trans. Inform. Theory, vol. 35, pp. 733- 
751, 1989. 

[12] A. Duel-Hallen, "A family of multiuser decision-feedback detectors 
for asynchronous code-division multiple-access channels," IEEE Trans. 
Commun., vol. 43, pp. 421^434, Feb./Mar./Apr. 1995. 

[13] L. R. Bahl, J. Cocke, F. Jelinek, and J. Raviv, "Optimal decoding of 
linear codes for minimizing symbol error rate," IEEE Trans. Inform. 
Theory, vol. 20, pp. 284-287, Mar. 1974. 

[14] J. Hagenauer, "Iterative decoding of binary block and convolutional 
codes," IEEE Trans. Inform. Theory, vol. 42, pp. 429^145, Mar. 1996. 

[15] D. MacKay, "Good error-correcting codes based on very sparse matri- 
ces," IEEE Trans. Inform. Theory, vol. 45, pp. 399-431, Mar. 1999. 

[16] Q. Huang, Q. Diao, S. Lin, and K. A. S. Abdel-Ghaffar, "Cyclic and 
quasi-cyclic LDPC codes on row and column constrained parity-check 
matrices and their trapping sets," http://arxiv.org/abs/1012.3201 Dec. 
2010. 



